278 PART 5 Looking for Relationships with Correlation and Regression
Working with unequal observation intervals
In this fatal accident example, each of the 12 data points represents the accidents
observed during a one-year interval. But imagine analyzing the frequency of
emergency department visits for patients after being treated for emphysema,
where there is one data point per patient. In that case, the width of the observa-
tion interval may vary from one individual in the data to another. GLM lets you
provide an interval width along with the event count for each individual in the
data. For arcane reasons, many statistical programs refer to this interval-width
variable as the offset.
Accommodating clustered events
The Poisson distribution applies when the observed events are all independent
occurrences. But this assumption isn’t met if events occur in clusters. Suppose
you count individual highway fatalities instead of fatal highway accidents. In that
case, the Poisson distribution doesn’t apply, because one fatal accident may kill
several people. This is what is meant by clustered events.
The standard deviation (SD) of a Poisson distribution is equal to the square root of
the mean of the distribution. But if clustering is present, the SD of the data is
larger than the square root of the mean. This situation is called overdispersion.
GLM in R can correct for overdispersion if you designate the distribution family
quasipoisson rather than poisson, like this:
glm(formula = Accidents ~ Year, family = quasipoisson(link = “log”))
FIGURE 19-5:
Linear and
exponential
trends fitted to
accident data.
© John Wiley & Sons, Inc.